import pandas as pd
import numpy as np
import seaborn as sea
from fbprophet import Prophet
from fbprophet.plot import plot_plotly
import plotly.offline as py
py.init_notebook_mode()
%matplotlib inline
plt.style.use('ggplot')
# Load data
df = pd.read_csv('test_restaurant.csv')
df.head()
df.info()
As we see, date is an object. However we know it is actually dates. Therefore we should first convert it into datetime before furthur analysis.
# Convert into datetime object
df['date'] = pd.to_datetime(df['date'])
# Plot
plt.figure(figsize=(10 ,7))
sea.lineplot(x=df['date'], y=df['num_visitors'])
From our plot, we can see the number of visitors has a yearly pattern. The number of visitors is also increasing slightly each year. We also observe that there are two values below 0. Negative number makes no sense so we will remove them.
# Drop negative values
df = df[df['num_visitors']>=0]
Next, we will fit a time series model. At first I was thinking about applying linear regression. However we cannot apply that into linear regression because using timestamps as predictor makes no sense. We will use fbprophet API here, which is developed by Facebook. This API performs great on forecasting time series.
# Rename columns
df.columns = ['ds','y']
# Model
m = Prophet(daily_seasonality=True)
m.fit(df)
# Predict forecasting
future = m.make_future_dataframe(periods = 1)
forecast = m.predict(future)
# Generate interactive plot
fig = plot_plotly(m, forecast) # This returns a plotly Figure
py.iplot(fig)
The black dots are actual values while blue curve is the predicted values. We can zoom in and out freely. Next we will get the forecasted value for one day forward.
forecast[['ds','yhat_lower', 'yhat_upper','yhat']].tail(1)
As we can see, this model predicts the number of visitors on 2009-12-30 will be $50.959915$.
Last step, we analyze the performance of this model. We will select RMSE as our metric. We define the Root Mean Squared Error (RMSE) as following: $$\sqrt{\frac{1}{n} \sum_{i= 1}^{n} (y_i- \hat{y})^2}$$
df2 = pd.merge(df, forecast[['ds','yhat']].iloc[:-1,:],on='ds',how = 'left')
rmse = np.sqrt(np.mean((df2['y']-df2['yhat'])**2))
rmse
In conclusion, the fbprophet performs great. The graph shows that actual values are most fit ted inside the neighborhood of predicted values except few outliers. Our performance metric RMSE also indicates a reasonable value.